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Z 7 Quasi- (that is, Sub-) Random Sequences 

We have just seen that choosing N points uniformly randomly in an n- 

dimensional space leads to an error term in Monte Carlo integration that decreases ^ 

as 1/ y/N. In essence, each new point sampled adds linearly to an accumulated sum 1 1 1 f l 
that will become the function average, and also linearly to an accumulated sum of S ^ i i ^ 
squares that will become the variance (equation 7.6.2). The estimated error comes 7l 3 "f c 
from the square root of this variance, hence the power iV 1 1 1 1 1 

_ Just because this square root convergence is familiar does not, however, mean 

^ that it is inevitable. A simple counterexample is to choose sample points that lie 

^ on a Cartesian grid, and to sample each grid point exactly once (in whatever order). 

CO The Monte Carlo method thus becomes a deterministic quadrature scheme — albeit 

S a simple one — whose fractional error decreases at least as fast as JV^ (even faster > | - 

fy if the function goes to zero smoothly at the boundaries of the sampled region, or l^i* 1 ' 

yf is periodic in the region). S 1 i o j 

U\ The trouble with a grid is that one has to decide in advance how fine it should ^ % of. \ 

M= be. One is then committed to completing all of its sample points. With a grid, it is V ^1 5 ! 

not convenient to "sample untir some convergence or termination criterion is met. 1 o § ^ j 
One might ask if there is not some intermediate scheme, some way to pick sample » « 1 1 1 
points "at random," yet spread out in some self-avoiding way, avoiding the chance | z | ^ | 
clustering that occurs with uniformly random points. ^ 1 ti m < 

A similar question arises for tasks other than Monte Carlo integration. We might 1 1 1 z 1 
want to search an n-dimensional space for a point where some (locally computable) 
'Ci condition holds. Of course, for the task to be computationally meaningful, there 

had better be continuity, so that the desired condition will hold in some finite n- 
dimensional neighborhood. We may not know a priori how large that neighborhood 
is, however. We want to "sample untir the desired point is found, moving smoothly 
to finer scales with increasing samples. Is there any way to do this that is better 
than uncorrelated; random samples? 

The answer to the above question is "y^s." Sequences of n-tuples that fill 
n-space more uniformly than uncorrelated random points are called quasi-random 
sequences. That term is somewhat of a misnomer, since there is nothing "random" 
about quasi-random sequences: They are cleverly crafted to be, in fact, j«i-random. 
The sample points in a quasi-random sequence are, in a precise sense, "maximally 
avoiding" of each other. 

A conceptually simple example is Halton *s sequence (1 ]. In one dimension, the 
jth number Hj in the sequence is obtained by the following steps: (i) Write j as a 
number in base 6, where b is some prime. (For example j = 17 in base 6 = 3 is 
122.) (ii) Reverse the digits and put a radix point (i.e.. a decimal point base 6) in 
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0 .2 .4 .6 .8 1 
points 1 to 128 




.2 .4 .6 .8 
points 513 to 1024 




.2 .4 .6 .8 
points 129 to 512 




.2 .4 ,6 .8 
points 1 to 1024 



Figure 7.7.1. Hrst 1024 points of a two-dimensional SoboK sequence. The sequence is generated 
number-theoretically, rather than randomly, so successive points at any stage "know** how to fill in the 
gaps in the previously generated distribution. 



front of the sequence. (In the example, we get 0.221 base 3.) The result is Hy To 
get a sequence of n-tuples in n-space, you make each con^nent a Halton sequence 
with a different prime base 6. Typically, the first n primes are used. 

It is not hard to see how Halton*s sequence works: Every time the number of 
digits in j increases by one place, j's digit-reversed fraction becomes a factor of 
h finer-meshed. Thus the process is one of filling in all the points on a sequence 
of finer and finer Cartesian grids — and in a kind of maximally spread-out order 
on each grid (since, e.g., the most rapidly changing digit in j controls the most 
significant digit of the fraction). 

Other ways of generating quasi-random sequences have been suggested by 
Faure, Sobol', Niederreiter, and others. Bratley and Fox (2) provide a good review 
and references, and discuss a particularly efficient variant of the Sobol' [3] sequence 
suggested by Antonov and Saleev [4]. It is this Antonov-Saleev variant whose 
implementation we now discuss. 
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Degree 



Primitive Polynomials Modulo 2* 



I 0 (i.c.. X + 1) 



2 


1 (i.e.. + X + 1) 


3 


1. 2 (i.e.. x3 + I + 1 and x^ + x^ + 1) 


4 


I, 4 (i.e., X* + X + 1 and x^ + x^ + 1) 


5 


2, 4. 7. 11. 13, 14 


6 


1. 13. 16. 19. 22. 25 


7 


1, 4. 7, 8, 14. 19. 21. 28, 31. 32, 37. 41. 42, 50. 55, 56, 59, 62 


8 


14. 21. 22. 38. 47, 49. 50. 52, 56, 67. 70, 84. 97. 103. 1 15. 122 


9 


8. 13, 16. 22. 25. 44, 47, 52. 55, 59, 62. 67. 74. 81. 82. 87. 91. 94, 103, 104. 109. 122, 
124. 137. 138. 143. 145, 152, 157, 167, 173, 176, 181, 182, 185, 191, 194, 199,218.220, 
227, 229, 230. 234, 236. 241, 244, 253 


10 


4, 13, 19, 22, 50, 55, 64, 69. 98, 107, 115, 121. 127, 134, 140. 145, 152, 158. 161. 171, 
181. 194. 199, 203, 208, 227, 242. 251, 253. 265, 266. 274. 283. 289, 295. 301. 316, 
319, 324, 346, 352, 361, 367. 382, 395, 398, 400, 412, 419. 422, 426, 428, 433, 446, 
454, 457, 472. 493, 505, 508 


^Expressed as a decimal integer representing the interior bits (that is, omitting the 
high-order bit and the unit bit). 



The Sobol* sequence generates numbers between zero and one directly as binary fractions 
of length w bits, from a set of w special binary fractions, Vi , i = 1, 2, . . . , called direction 
numbers. In SoboFs original method, the j'th number Xj is generated by XORing (bitwise 
exclusive or) together the set of K's satisfying the criterion on t, "the ith bit of j is nonzero." 
As j increments, in other words, different ones of the Vi 's flash in and out of Xj on different 
time scales. W alternates between being present and absent most quickly, while Vk goes from 
present to absent (or vice versa) only every 2* ^ steps. 

Antonov and Saleev's contribution was to show that instead of using the bits of the 
integer J to select direction numbers, one could just as well use the bits of the Gray code of j, 
G(J). (For a quick review of Chay codes, look at §20.2.) 

Now G(J) and G{j -h 1) differ in exactly one bit position, namely in the position of the 
rightmost zero bit in the binary representation of j (adding a leading zero to j if necessary). A 
consequence is that the j + 1st Sobol' -Antonov-Saleev number can be obtained fit)m the jth 
by XORing it with a single Vu namely with i the position of the rightmost zero bit in j. This 
makes the calculation of the sequence very efficient, as we shall see. 

Figure 7.7. 1 plots the first 1024 points generated by a two-dimensional Sobol' sequence. 
One sees that successive points do "know" about the gaps left previously, and keep filling 
them in, hierarchically. 

We have deferred to this point a discussion of how the direction numbers Vi are generated. 
Some nontrivial mathematics is involved in that, so we will content ourself with a cookbook 
summary only: Each different Sobol' sequence (or component of an n-dimensional sequence) 
is based on a different primitive polynomial over the integers modulo 2, that is, a polynomial 
whose coefficients are either 0 or 1, and which cannot be factored (using modulo 2 integer 
arithmetic) into polynomials of lower order. (Primitive polynomials modulo 2 were used in 
§7.4, and are further discussed in §20.3.) Suppose P is such a polynomial, of degree q. 
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Initializing Values Used in eobsaq 


Degree 


Polynomial 


Starting Values 


1 


0 




(3) 


(5) 


(15) ... 


2 






1 


(7) 


01) ... 


3 






3 


7 


(5) ... 


3 






3 


3 


(15) ... 


4 








3 


13 ... 


4 








5 


9 ... 


Parenthesized values are not freely specifiable, but are forced by the required recunence 
for this degree. 



Define a sequence of integers M{ by the 9-lenn recunence relation, 
Mi = 2axMi. 1 e 2*02^1. a e • • e 2« ,+ia,. 1 0 (2«M«. , e Mi. ,) (7.7.2) 

Here bitwise XOR is denoted by ®. The starting values for this recurrence are that Af i , . . . , M , 
can be arbitrary odd integers less than 2, . . . , 2', respectively. Then, the direction numbers 
Vi are given by 

i=l,....ti; (7.7.3) 

The accompanying table lists all primitive polynomials modulo 2 with degree q < 10. 
Since the coefficients are either 0 or 1, and since the coefficients of and of 1 are predictably 
1 , it is convenient to denote a polynomial by its middle coefficients taken as the bits of a binary 
number (higher powers of x being more significant bits). The table uses this convention. 

Turn now to the implementation of the Sobol' sequence. Successive calls to the function 
sobseq (after a preliminary initializing call) return successive points in an n-dimensional 
Sobol* sequence based on the first n primitive polynomials in the table. As given, the 
routine is initialized for maximum n of 6 dimensions, and for a word length w of 30 bits. 
These parameters can be altered by changing MAXBIT (= w) and NAXDIM. and by adding 
more initializing data to the arrays Ip (the primitive polynomials finom the table), mdeg (their 
degrees), and iv (the starting values for the recurrence, equation 7.7.2). A second table, 
above, elucidates the initializing data in the routine. 

•include "nrutil.h" 
«def Ine KAXBIT 30 
«def Ine HAXDIH 6 

void sob8eq(int *n, float xC3) 

When n is negative, internally initializes a set of MAXBIT direction numbers for each ofMAXDIM 
different Sobol' sequences. When n is positive (but <MAXDIM), returns as the vector x[l . .n] 
the next values from n of these sequences, (n must not be changed between initializations.) 

int J.k.l; 

unsigned long i.im.lpp; 
static float fac; 

static unsigned long in,ix[MAXDIM+l3 ,*iuCMAXBIT+l] ; 
static unsigned long mdegCMAXDIM+l]-{0,i,2,3,3,4,4>; 
static unsigned long ipCMAXDIM+l]-C0,0,l,l,2,l,4>; 
static unsigned long ivCMAXDIM»MAXBIT+l]-C 

0.1,1, 1.1,1, 1,3. t, 3. 3, 1,1, 5, 7, 7, 3, 3, 5, 15, 11. 5, 15, 13, 9}; 



if (♦n < 0) { 

for (k-l;k<-MAXDIM;k-M.) ixCk]-0; 



Initialize, don't return a vector. 
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ln«0; 

if (ivCl] !• 1) return; 
fac-1.0/(lL « KiXBIT); 

for (j-l.k^;j<-IUXBIT;J*+,k+-HUDIM) luCj] - AlvCk] ; 
To allow both ID and 2D addressing, 
for (k-l;k<-KAXDIN:k^) { 

for (j-l;J<-ad«gCk];J'M.) luCjlCk] «- (MAXBIT-J); 

Stored values only require normalization. 

for (j-md«gCk]'t'l;j<-KAXBIT;J++) { Use the recurrence to get other va^ 
ipp-lpCk}; ues. 
i-luCj-mdegCk]]Ck3; 
1 (1 » ndegCk]); 
for <l-adegCk]-l;l>-l;l— ) { 

if (Ipp » 1) 1 iuCj-lDCk]; 

ipp »- 1; 

} 

iuCjlCk]-!; 

> 

> 

•Ise i Calculate the next vector in the se- 

lm-in+>; quence. 
for (j-l;j<-MAXBIT; j-M.) { Rnd the rightmost zero bit. 

if (Kim k 1)) break; 

im »- 1; 

> 

if (J > MAXBIT) nrerrorCxMAXBIT too small in sobseq"); 
im«(j-l)»KAXDIM; 

for (k-1 ;k<-IMIH(*ii,MAXDIM) ;k++) { XOR the appropriate direction nunv 

ixCk] *■ ivCiffl'»-k] ; ber into each component of the 

X DO -ix [k] *f ac ; vector and convert to a floating 

} number. 



How good is a Sobol' sequence, anyway? For Monte Cario integration of a smooth 
function in n dimensions, the answer is that the fractional error will decrease with TV, the 
number of samples, as (In N)^/N, i.e.. almost as fast as 1/iV. As an example, let us integrate 
a function that is nonzero inside a tonis (doughnut) in three-dimensional space. If the major 
radius of the torus is Ao* the minor radial coordinate r is defined by 



r=([(x^ + »Y^='-flol'+-^y (7.7.4) 
Let US tiy the function 



/(x,y,z)=^l+«»|^-^j '•<'-o (7.7.5) 

r > ro 

which can be integrated analytically in cylindrical coordinates, giving 



///' 



dxdydz /(x, y, z) = 2TrarRo (7.7.6) 

With parameters Ro = 0.6, ro = 0.3. we did 100 successive Monte Carlo integrations of 
equation (7.7.4). sampling uniformly in the region -1 < x,y,z < 1, for the two cases of 
unconelated random points and the Sobol* sequence generated by the routine sobseq. Ingure 
7.7.2 shows the results, plotting the r.m.s. average error of the 100 integrations as a function 
of the number of points sampled. (For any single integration, the error of course wanders 
fipom positive to negative, or vice versa, so a logarithmic plot of fractional enor is not very 
informative.) The thin, dashed curve corresponds to uncorrelated random points and shows 
the familiar N' asymptotics. The thin, solid gray curve shows the result for the SoboP 
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number of points N 

Figiiie 7.7^. Fractional accuracy of Monte Cario integrations as a function of number of points sampled, 
for two different integrands and two different methods of choosing random points. The quasi-random 
Sobol* sequence converges much more rapidly than a conventional pseudo-random sequence. Quasi- 
random sampling does better when the integrand is smooth (**soft boundary**) than when it has step 
discontinuities Chard boundary**). The curves shown are die rjn.s. average of 100 trials. 

sequence. The logarithmic tenn in the expected {hiN)^/N is readily apparent as curvature 
in the curve, but the asymptotic is unmistakable. 

To understand the importance of Hgure 7.7.2, suppose that a Monte Cario integration of 
/ with 1% accuracy is desired. The Sobol' sequence achieves this accuracy in a few thousand 
samples, while pseudorandom sampling requires neariy 100,000 samples. The ratio would 
be even greater for higher desired accuracies. 

A different, not quite so favorable, case occurs when the function being integrated has 
hard (discontinuous) boundaries inside the sampling region, for example the function that is 
one inside the torus, zero outside. 



/(x,y,z)={J 



r < To 
r > ro 



(7.7.7) 



where r is defined in equation (7.7.4). Not by coincidence, this function has the same analytic 
integral as the function of equation (7,7.5), namely 27r'a^/io- 

The carefully hierarchical Sobol* sequence is based on a set of Cartesian grids, but the 
boundary of the torus has no particular relation to those grids. The result is that it is essentially 
random whether sampled points in a thin layer at the surface of the torus, containing on the 
order of N^^^ points, come out to be inside, or outside, the torus. The square root law, applied 
to this thin layer, gives N^^^ fluctuations in the sum, or N- fractional error in the Monte 
Cario integral. One sees this behavior verified in Figure 7.7.2 by the thicker gray curve. The 
thicker dashed curve in Figure 7.7.2 is the result of integrating the function of equation (7.7.7) 
using independent random points. While the advantage of the SoboP sequence is not quite so 
dramatic as in the case of a smooth function, it can nonetheless be a significant factor (^5) 
even at modest accuracies like 1%, and greater at higher accuracies. 
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Note that we have not provided the routine soba^q with a means of stalling the 
sequence at a point other than the beginning, but this feature would be easy to add. Once 
the initialization of the direction numbers Iv has been done, the jdi point can be obtained 
direcdy by XORing together those direction numbers conesponding to nonzero bits in the 
Gray code of j, as described above. 



TTiB Latin Hypercube 

We might here give passing mention the unrelated technique of Latin square or 
LxLtin hypercube sampling, which is useful when you must sample an iV-dimensional 
space exceedingly sparsely, at M points. For example, you may want to test the 
crashworthiness of cars as a simultaneous function of 4 different design parameters, 
but with a budget of only three expendable cars. (The issue is not whether this is a 
good plan — it isn't — but rather how to make the best of the situation!) 

The idea is to partition each design parameter (dimension) into M segments, so 
that the whole space is partitioned into cells. (You can choose the segments in 
each dimension to be equal or unequal, according to taste.) With 4 parameters and 3 _ _ 
cars, for example, you end up with 3x3x3x3 = 81 cells. f 1 1 f 2 

Next, choose M cells to contain the sample points by the following algorithm: ? 8 ^ Jf p 
Randomly choose one of the cells for the first point. Now eliminate all cells i f g 

^ that agree with this point on any of its parameters (that is, cross out all cells in the 1 II f i 

^ same row, colunm, etc.). leaving (Af - 1)^ candidates. Randomly choose one of 

S these, eliminate new rows and colunms, and continue the process until there is only 

\^ one cell left, which then contains the final sample point. alf ?! '1 

^ The result of this construction is that each design parameter will have been ^ ||?m 

tested in every one of its subranges. If the response of the system under test is 5 1 1 1- 3 
dominated by one of the design parameters, that parameter will be found with 
this sampling technique. On the other hand, if there is an important interaction 
among different design parameters, then the Latin hypercube gives no particular 
advantage. Use with care. 
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7.8 Adaptive and Recursive Monte Carlo 
Methods 

This section' discusses more advanced techniques of Monte Carlo integration. As 
examples of the use of these techniques, we include two rather different, fairly sophisticated, 
multidimensional Monte Carlo codes: vegas[1.2], and aiser[3). The techniques that we 
discuss all fall under the general lubric of reduction of variance (§7.6), but are otherwise 
quite distinct 

Importance Sampling 

The use of importance sampling was already implicit in equations (7.6.6) and (7.6.7). 
We now return to it in a slightly more formal way. Suppose that an integrand / can be written 
as the product of a function h that is almost constant times another, positive, function g. Then 
its integral over a multidimensional volume V is 9 <£- £ 

j j if/9) 9dV = l hgdV (7.8.1) | 

In equation (7.6.7) we interpreted equation (7.8.1) as suggesting a change of variable to 1^ f* 

G, the indefinite integral of g. That made gdV a perfect differential. We then proceeded !S | § 
to use the basic theorem of Monte Carlo integration, equation (7.6.1). A more general 

interpretation of equation (7.8.1) is that we can integrate / by instead sampling h — not, z.^ 
however, with uniform probability density dV, but rather with nonuniform density gdV, In 

this second interpretation, the first interpretation follows as the special case, where the means > s-a 

of generating the nonuniform sampling of gdV is via the transformation method, using the i ^ g- 

indefinite integral G (see §7.2). §"| ^ 

More direcdy, one can go back and generalize die basic theorem (7.6.1) to the case § § ^ 

of nonuniform sampling: Suppose that points Xi are chosen within the volume V with a ? I 
probability density p satisfying 



The generalized fundamental theorem is that the integral of any function / is estimated, using ^^ '-n 



pdV=l (7.8.2) 

tmistl 

N sample points x*, . . . , xjv. by 

/.//^ = /ip«'.(/)i/i£^p^ (7.«, 

where angle brackets denote arithmetic meaiis over the N points, exacUy as in equation 
(7.6.2). As in equation (7.6.1), the •*plus-or-minus" term is a one standard deviation error a 5 I © 

estimate. Notice that equation (7.6.1) is in fact the special case of equation (7.8.3). with 1" t J ^? IB 

p = constant = , |§1 |X 

What is the best choice for the sampling density p? Intuitively, we have already ^ ^ ^ ^ 

seen that the idea is to make h = f/p as close to constant as possible. We can be more o ^ 

rigorous by focusing on the numerator inside the square root in equation (7.8.3). which is > S ^ " 

the variance per sample point. Both angle brackets are themselves Monte Carlo estimators i s ^ 

of integrals, so we can write g " §• 

*^(?)-(?)'^/?'^-[/?H'=/p'^-[/H" 

We now find the optimal p subject to the constraint equation (7.8.2) by the functional variation 

l^dV-[lfdv]\xlpdV^ (7.8.5) 



